We are a group of 4 members:
Aishwarya Sinhasane - avsinhas@iu.edu (In picture, Left top)
Himanshu Joshi - hsjoshi@iu.edu (In picture, Right bottom)
Sreelaxmi Chakkadath - schakkad@iu.edu (In picture, Left bottom)
Sumitha Vellinalur Thattai - svtranga@iu.edu (In picture, Right top)
The objective of our project is to classify images as either dogs or cats. Additionally, we also plan to find where the cat/dog is in the image. Although the task is simple to human eyes, computers find it hard to distinguish between images because of a plethora of factors including cluttered background, illumination conditions, deformations, occlusions among several others. We plan to build an end-to-end machine learning model which will help computers differentiate between cat and dog images with better accuracy and attain the goal of object detection by also predicting the bounding boxes.
The whole project was completed in multiple phases. In our first phase, we experimented with various techniques such as SGD, Adaboost and Gradient boost, and we chose our baseline model to be gradient boost. Linear regression was used to detect bounding boxes.
Furthermore, during the second phase, we implemented logistic and linear regression from scratch. We also built a multi-layered perceptron and calculated the accuracy and loss per epoch for classification and regression tasks. In addition to this, we built a multi-headed predictor that calculates the combined loss function from classification and the regression tasks and used it to optimize the weights and bias of the network. We also fine-tuned the model to decrease the loss.
In this phase, we extended the baseline and implemented a complex loss function (CXE + MSE) using homegrown linear and logistic regression. Additionally, we also used the transfer learning EfficientDet [D0 – D7] to train our classifier and regressor. We tuned the head layer and modified the learning rate, which yielded us with a mAP value of 0.16 for D0 and 0.09 for D7; the mAP for D7 is lesser because of the less number of epochs
Furthermore, we also build a multi headed fully convolutional neural network which gave us a test accuracy of 61%. Comparing the Efficientdet with the FCN, we see that FCN performs better in our case. But the performance might improve if we fine tune and increase the number of epochs for EfficientDet D7
Build multiple models – adaboost, stochastic gradient descent and gradient boost for classification; gradient boosting chosen as a baseline
Built a linear regression model for boundary box detection; fixed the same as a bassline
Built a homegrown logistic regression for image classification
Extending from phase 0, we built home grown linear regression and combined the 2 loss functions CXE + MSE
Build a multi-layer perceptron using sequential and OOP; fine-tuned the model by data augmentation, adding dropout layers and regularization
Build an MLP that combined the loss functions of classifier and regressor which was then backpropagated to the optimizer for tuning weights and bias
Experimented with fully convolutional neural network
We have completed the following task in phase 3:
Homegrown linear and logistic regression
Complex loss function (CXE + MSE) using homegrown models; this was used to fine the model in each epoch
Used the efficiendet [D0-D7] to model and predict the image and boundary box; compared the performance among the D0-D7
Implemented convolutional neural network for image classification and box detection
Fine-tuned the model – modifying layers, adding dropout layer, adding regularization
Combined the loss function and built a multi-headed cat-dog detector that combines loss function: CXE + MSE and uses the same to fine tune the weights and bias
Visualized the result using tensorboard
Compared the performance of EfficientDet and FCN
The data we plan to use is the Kaggle data set. We will be using two files – one for image and the other for boundary:
The images are taken from the cadod.tar.gz
The boundary information of the images is from cadod.csv file
Image information (cadod.tar.gz):
There are ~13K images of various sizes and aspect ratios. Majority of the images are of 512 X 384 size
All the images are in RGB scale
The boundaries of the images are stored in the cadod.csv file
Attributes of the Boundary File (cadod.csv):
This has information about the image box coordinates
There are about 20 features in this data set
-15 numerical features
- This includes the image ID, the coordinates of the boundary boxes, and also the normalized coordinates of the boundary boxes
-5 categorical features
-This gives information about the occlusion, depiction, truncation, etc.
We have completed the following task in phase 3:
Homegrown linear and logistic regression
Complex loss function (CXE + MSE) using homegrown models; this was used to fine the model in each epoch
Used the efficiendet [D0-D7] to model and predict the image and boundary box; compared the performance among the D0-D7
Implemented convolutional neural network for image classification and box detection
Fine-tuned the model – modifying layers, adding dropout layer, adding regularization
Combined the loss function and built a multi-headed cat-dog detector that combines loss function: CXE + MSE and uses the same to fine tune the weights and bias
Visualized the result using tensorboard
Compared the performance of EfficientDet and FCN
Since the data set is very large, we have carried out the above tasks were carried out only in a subset of data. Hence the results will be only directional and not accurate
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm import tqdm
import warnings
import seaborn as sns
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
def extract_tar(file, path):
"""
function to extract tar.gz files to specified location
Args:
file (str): path where the file is located
path (str): path where you want to extract
"""
with tarfile.open(file) as tar:
files_extracted = 0
for member in tqdm(tar.getmembers()):
if os.path.isfile(path + member.name[1:]):
continue
else:
tar.extract(member, path)
files_extracted += 1
tar.close()
if files_extracted < 3:
print('Files already exist')
path = 'images/'
# extract_tar('cadod.tar.gz', path)
df = pd.read_csv('cadod.csv')
df.head()
!mkdir -p images/resized
%%time
# resize image and save, convert to numpy
img_arr = np.zeros((df.shape[0],128*128*3)) # initialize np.array
for i, f in enumerate(tqdm(df.ImageID)):
img = Image.open(path+f+'.jpg')
img_resized = img.resize((128,128))
img_resized.save("images/resized/"+f+'.jpg', "JPEG", optimize=True)
img_arr[i] = np.asarray(img_resized, dtype=np.uint8).flatten()
Plot the resized and filtered images
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
img = mpimg.imread(path+'/resized/'+df.ImageID.values[j]+'.jpg')
h, w = img.shape[:2]
coords = df.iloc[j,4:8]
ax[i].imshow(img)
ax[i].set_title(df.iloc[j,2])
ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h),
coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h,
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
# encode labels
df['Label'] = (df.LabelName == 'dog').astype(np.uint8)
mkdir -p data
np.save('data/img.npy', img_arr.astype(np.uint8))
np.save('data/y_label.npy', df.Label.values)
np.save('data/y_bbox.npy', df[['XMin', 'YMin', 'XMax', 'YMax']].values.astype(np.float32))
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
idx_to_label = {1:'dog', 0:'cat'} # encoder
Double check that it loaded correctly
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(X.shape[0], size=6, replace=False)):
coords = y_bbox[j] * 128
ax[i].imshow(X[j].reshape(128,128,3))
ax[i].set_title(idx_to_label[y_label[j]])
ax[i].add_patch(plt.Rectangle((coords[0], coords[1]),
coords[2]-coords[0], coords[3]-coords[1],
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
The mean square function formula is as follows : $ \text{MSE}({\mathbf{\theta}}; \mathbf{X}) = \dfrac{1}{m} \sum\limits_{i=1}^{m}{( \hat{y_i} - y_i)^2} $
where $m$ is the number of data points , $\hat{y_i}$ is the predicted value
import random
random.seed(10)
idxs = random.sample(range(0, X.shape[0]), 100)
X_final = X[idxs]
y_label_final = y_label[idxs]
X_final = X[idxs]
y_bbox_final = y_bbox[idxs]
X_train, X_test, y_train, y_test = train_test_split(X_final, y_bbox_final, test_size=0.01, random_state=27)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=27)
# np.random.seed(42)
# if np.max(X_train) > 4.:
# X_train = X_train.astype(np.float32) / 255.
# if np.max(X_test) > 4.:
# X_test = X_test.astype(np.float32) / 255.
#y_train=y_train.astype(int)
#y_test=y_test.astype(int)
from sklearn.preprocessing import MinMaxScaler, Normalizer
scaler = Normalizer(norm = 'l1')
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid) #Transform test set with the same constants
#X_test = scaler.transform(X_test) #Transform test set with the same constants
#Source - HW -5
class BasicLinearRegressionHomegrown(object):
def __init__(self, l1_reg = 0.0, l2_reg = 0.01):
self.coef_ = None # weight vector
self.intercept_ = None # bias term
self._theta = None # augmented weight vector, i.e., bias + weights
# this allows to treat all decision variables homogeneously
self.l1_reg = l1_reg
self.l2_reg = l2_reg
self.history = {"cost": [],
'val_cost':[],
"coef": [],
"intercept": [],
"grad": []}
def _grad(self, X, y):
"""
Calculate the gradient of the objective function
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
Return:
gradient(ndarray): analytical gradient vector
"""
pred = self.predict(X)
error = pred - y
gradient = np.dot(X.T, error) / X.shape[0]
#gradient[1:] += 2 * self.l2_reg * self._theta[1:] + self.l1_reg * np.sign(self._theta[1:])
return gradient
# full gradient descent, i.e., not stochastic gd
def _gd(self, X, y, max_iter, X_val, y_val, alpha = 0.003):
"""
Runs GD and logs error, weigths, gradient at every step
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
max_iter(int): number of weight updates
alpha(floar): step size in direction of gradient
Return:
None
"""
for i in range(max_iter):
self.history["coef"].append(self._theta[1:].copy())
self.history["intercept"].append(self._theta[0].copy())
mse = self.score(X, y)
self.history["cost"].append(mse)
if X_val is not None:
mse = self.score(X_val, y_val)
self.history["val_cost"].append(mse)
# calculate gradient
grad = self._grad(X, y)
self.history["grad"].append(grad)
# do gradient step
self._theta -= alpha * grad
def fit(self, X, y, max_iter=100, val_data = None):
"""
Public API for fitting a linear regression model
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
max_iter(int): number of weight updates
Return:
self
"""
# Augment the data with the bias term.
# So we can treat the the input variables and the bias term homogeneously
# from a vectorization perspective
X = np.c_[np.ones(X.shape[0]), X]
# initialize if the first step
if val_data is not None:
X_val, y_val = val_data
X_val = np.c_[np.ones(X_val.shape[0]), X_val]
else:
X_val = None
y_val = None
if self._theta is None:
self._theta = np.random.rand(X.shape[1],4)
# do full gradient descent
self._gd(X, y, max_iter, X_val, y_val)
self.intercept_ = self._theta[0]
self.coef_ = self._theta[1:]
return self
def MSE(self, X, y, y_pred):
error = y - y_pred
mse = (np.sum(error ** 2)) / X.shape[0]
return mse
def score(self, X, y):
pred = self.predict(X)
mse = np.round(mean_squared_error(y, pred),3)
return mse
def predict(self, X):
"""
Make a prediction
Args:
X(ndarray): objects
Return:
pred(ndarray): predictions
"""
# check whether X has appended bias feature or not
if X.shape[1] == len(self._theta):
pred = np.dot(X, self._theta)
else:
pred = np.dot(X, self.coef_) + self.intercept_
#print(pred)
return pred
model_homegrown = BasicLinearRegressionHomegrown(l1_reg = 0, l2_reg = 0.0)
#Training the model for the 4 boundaries
np.random.seed(42)
model_homegrown.fit(X_train, y_train, max_iter=1000, val_data=(X_valid, y_valid))
y_pred_train = model_homegrown.predict(X_train)
cost_train = model_homegrown.history["cost"]
cost_val = model_homegrown.history["val_cost"]
# y_pred_train = np.concatenate((y_pred1, y_pred2,y_pred3,y_pred4),axis = 0)
# y_pred_test = np.concatenate((y_pred_test1, y_pred_test2,y_pred_test3,y_pred_test4),axis = 0)
# #y_pred_train
# #y_pred_test
Mean Square Error:
plt.figure(figsize=(20, 8))
plt.suptitle("Homegrown Linear Regression")
#plt.subplot(121)
plt.plot(cost_train, label="Train")
plt.plot(cost_val, label="Valid")
plt.legend(loc="upper left")
plt.xlabel("Iteration")
plt.ylabel("Loss")
#plt.xlim([0,100])
#plt.ylim([0.0, 0.3])
#plt.subplot(122)
plt.show()
Implement a Homegrown Logistic Regression model. Extend the loss function from CXE to CXE + MSE, i.e., make it a complex multitask loss function the resulting model predicts the class and bounding box coordinates at the same time.
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test_label = train_test_split(X_final, y_label_final, test_size=0.01, random_state=27)
#s = StandardScaler()
#s.fit_transform(X_train)
#s.transform(X_test)
# Source - HW 7
class LogisticRegressionHomegrown(object):
def __init__(self):
"""
Constructor for the homgrown Logistic Regression
Args:
None
Return:
None
"""
self.coef_ = None
self.intercept_ = None
self._theta = None
self.history = {"cost": [],
"acc": [],
"val_cost":[],
"val_acc": [],
'val_prob':[],
"prob":[]}
def _grad(self, X, y):
"""
Calculates the gradient of the Logistic Regression
objective function
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
Return:
grad(ndarray): gradient
"""
n = X.shape[0]
scores = self._predict_raw(X)
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
probs[range(n),y] -= 1
gradient = np.dot(X.T, probs) / n
return gradient
def _gd(self, X, y, max_iter, alpha, X_val, y_val):
for i in range(max_iter):
metrics = self.score(X, y)
self.history["cost"].append(metrics["cost"])
self.history["acc"].append(metrics["acc"])
self.history["prob"].append(metrics["prob"])
if X_val is not None:
metrics_val = self.score(X_val, y_val)
self.history["val_cost"].append(metrics_val["cost"])
self.history["val_acc"].append(metrics_val["acc"])
self.history["val_prob"].append(metrics_val["prob"])
# gradient
grad = self._grad(X, y)
self._theta -= alpha * grad
def fit(self, X, y, max_iter=100, alpha=0.05, val_data=None):
X = np.c_[np.ones(X.shape[0]), X]
if val_data is not None:
X_val, y_val = val_data
X_val = np.c_[np.ones(X_val.shape[0]), X_val]
else:
X_val = None
y_val = None
if self._theta is None:
self._theta = np.random.rand(X.shape[1], len(np.unique(y)))
self._gd(X, y, max_iter, alpha, X_val, y_val)
self.intercept_ = self._theta[0]
self.coef_ = self._theta[1:]
def score(self, X, y):
n = X.shape[0]
scores = self._predict_raw(X)
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
corect_logprobs = -np.log(probs[range(n),y])
data_loss = np.sum(corect_logprobs) / n
# predictions
pred = np.argmax(scores, axis=1)
# accuracy
acc = accuracy_score(y, pred)
metrics = {"acc": acc, "cost": data_loss, 'prob':corect_logprobs}
return metrics
def _predict_raw(self, X):
if X.shape[1] == len(self._theta):
scores = np.dot(X, self._theta)
else:
scores = np.dot(X, self.coef_) + self.intercept_
return scores
def predict(self, X):
"""
Predicts class for each object in X
Args:
X(ndarray): objects
Return:
pred(ndarray): class for each object
"""
# get scores for each class
scores = self._predict_raw(X)
pred = np.argmax(scores, axis=1)
return pred
X_train, X_test, y_train, y_test_label = train_test_split(X_final, y_label_final, test_size=0.01, random_state=27)
np.random.seed(42)
if np.max(X_train) > 4.:
X_train = X_train.astype(np.float32) / 255.
if np.max(X_test) > 4.:
X_test = X_test.astype(np.float32) / 255.
y_train=y_train.astype(int)
y_test=y_test.astype(int)
class FixedLogisticRegressionHomegrown(LogisticRegressionHomegrown):
def __init__(self):
# call the constructor of the parent class
super(FixedLogisticRegressionHomegrown, self).__init__()
def _predict_raw(self, X):
# check whether X has appended bias feature or not
if X.shape[1] == len(self._theta):
scores = np.dot(X, self._theta)
else:
scores = np.dot(X, self.coef_) + self.intercept_
# normalize raw scores to prevent overflow
scores -= np.max(scores, axis=1, keepdims=True)
return scores
model_lr_homegrown_fixed = FixedLogisticRegressionHomegrown()
#Training model
model_lr_homegrown_fixed.fit(X_train, y_train, max_iter=2000, alpha=0.05, val_data=(X_test, y_test_label))
len(model_lr_homegrown_fixed.history["cost"])
plt.figure(figsize=(20, 8))
plt.suptitle("Homegrown Logistic Regression")
#plt.subplot(121)
plt.plot(model_lr_homegrown_fixed.history["cost"], label="Train")
plt.plot(model_lr_homegrown_fixed.history["val_cost"], label="Test")
plt.legend(loc="upper left")
plt.xlabel("Iteration")
plt.ylabel("Loss")
plt.show()
y_pred_test = model_lr_homegrown_fixed.predict(X_test)
Result:
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
import warnings
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_final, y_bbox_final, test_size=0.1, random_state=27)
X_train_r, X_valid_r, y_train_r, y_valid_r = train_test_split(X_train_r, y_train_r, test_size=0.1, random_state=27)
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_final, y_label_final, test_size=0.1, random_state=27)
X_train_c, X_valid_c, y_train_c, y_valid_c = train_test_split(X_train_c, y_train_c, test_size=0.1, random_state=27)
# scale data for reg
from sklearn.preprocessing import MinMaxScaler, Normalizer
scaler = Normalizer(norm = 'l2')
X_train_r = scaler.fit_transform(X_train_r)
X_valid_r = scaler.transform(X_valid_r) #Transform test set with the same constants
#X_test = scaler.transform(X_test) #Transform test set with the same constants
# scale data for classification
np.random.seed(42)
if np.max(X_train_c) > 4.:
X_train_c = X_train_c.astype(np.float32) / 255.
if np.max(X_valid_c) > 4.:
X_valid_c = X_valid_c.astype(np.float32) / 255.
y_train_c=y_train_c.astype(int)
y_valid_c=y_valid_c.astype(int)
import warnings
warnings.filterwarnings('ignore')
class ComplexLogisticRegressionHomegrown(object):
def __init__(self, l1_reg = 0.0, l2_reg = 0.01):
"""
Constructor for the Homegrown Logistic Regression
Args:
None
Return:
None
"""
self.l1_reg = 0.0
self.l2_reg = 0.01
self.coef_r = None
self.intercept_r = None
self.coef_c = None
self.intercept_c = None
self._thetaReg = None
self._thetaClass = None
self.history = {"CXE + MSE train_loss": [],
"Train_acc_C": [],
"Train_CXE_C":[],
"Train_MSE_R":[],
"CXE + MSE val_loss":[],
"Val_CXE_C":[],
"Val_acc_C": [],
"Val_MSE_R":[]}
def _gradRegression(self, X, y):
"""
Calculates the gradient of the Linear Regression
objective function
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
Return:
grad(ndarray): gradient
"""
pred = np.dot(X, self._thetaReg)
error = pred - y
gradient = np.dot(X.T, error) / X.shape[0]
#gradient[1:] += 2 * self.l2_reg * self._thetaReg[1:] + self.l1_reg * np.sign(self._thetaReg[1:])
#return gradient
# n = X.shape[0]
# # get scores for each class and example
# # 2D matrix
# scores = self._predict_raw(X,val=1)
# gradient = np.dot(X.T, scores) / n
return gradient
# gradient for classifier
def _gradClassification(self, X, y):
"""
Calculates the gradient of the Logistic Regression
objective function
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
Return:
grad(ndarray): gradient
"""
# n = X.shape[0]
# scores = self._predict_raw(X, val=2)
# exp_scores = np.exp(scores)
# probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
# probs[range(n),y] -= 1
# gradient = np.dot(X.T, probs) / n
# return gradient
n = X.shape[0]
scores = self._predict_raw(X,val=2)
probs = 1.0/(1 + np.exp(-scores))
probs[range(n),y] -= 1
gradient = np.dot(X.T, probs) / n
return gradient
def _gd(self, X_r, y_r,X_c,y_c, max_iter, lr1, lr2, X_val_r, y_val_r, X_val_c, y_val_c):
"""
Runs Full GD and logs error, weigths, gradient at every step
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
max_iter(int): number of weight updates
alpha(floar): step size in direction of gradient
Return:
None
"""
for i in range(max_iter):
metrics = self.score(X_r, y_r, X_c, y_c)
print("Epoch: ",i+1,"- ", metrics)
self.history["CXE + MSE train_loss"].append(metrics["CXE + MSE loss"])
self.history["Train_acc_C"].append(metrics["Acc"])
self.history["Train_CXE_C"].append(metrics["CXE"])
self.history["Train_MSE_R"].append(metrics["MSE"])
if X_val_r is not None and X_val_c is not None:
metrics_val = self.score(X_val_r, y_val_r,X_val_c, y_val_c)
self.history["CXE + MSE val_loss"].append(metrics_val["CXE + MSE loss"])
self.history["Val_acc_C"].append(metrics_val["Acc"])
self.history["Val_CXE_C"].append(metrics_val["CXE"])
self.history["Val_MSE_R"].append(metrics_val["MSE"])
# calculate gradient for regressor
grad_reg = self._gradRegression(X_r, y_r)
# calculate gradient for classifier
grad_class = self._gradClassification(X_c, y_c)
# do gradient step
self._thetaReg -= lr1 * grad_reg
# do gradient step
self._thetaClass -= lr2 * grad_class
def fit(self, X_r,y_r,X_c,y_c, max_iter=1000, lr1=0.001, lr2 = 0.1, val_data_r=None, val_data_c=None):
"""
Public API to fit Logistic regression model
Args:
X(ndarray): train objects
y(ndarray): answers for train objects
max_iter(int): number of weight updates
alpha(floar): step size in direction of gradient
Return:
None
"""
X_r = np.c_[np.ones(X_r.shape[0]), X_r]
if val_data_r is not None:
X_val_r, y_val_r = val_data_r
X_val_r = np.c_[np.ones(X_val_r.shape[0]), X_val_r]
else:
X_val_r = None
y_val_r = None
# initialize if the first step
if self._thetaReg is None:
self._thetaReg = np.random.rand(X_r.shape[1], 4)
#classification
X_c = np.c_[np.ones(X_c.shape[0]), X_c]
if val_data_c is not None:
X_val_c, y_val_c = val_data_c
X_val_c = np.c_[np.ones(X_val_c.shape[0]), X_val_c]
else:
X_val_c = None
y_val_c = None
# initialize if the first step
if self._thetaClass is None:
self._thetaClass = np.random.rand(X_c.shape[1], len(np.unique(y_c)))
# do full gradient descent
self._gd(X_r, y_r,X_c,y_c, max_iter, lr1, lr2, X_val_r, y_val_r, X_val_c, y_val_c)
# get final weigths and bias
self.intercept_r = self._thetaReg[0]
self.coef_r = self._thetaReg[1:]
# get final weigths and bias
self.intercept_c = self._thetaClass[0]
self.coef_c = self._thetaClass[1:]
def score(self, X_r, y_r, X_c, y_c):
# number of training samples
n1 = X_r.shape[0]
n2 = X_c.shape[0]
# get scores
scores_r = self._predict_raw(X_r,val=1)
scores_c = self._predict_raw(X_c,val=2)
pred_r=scores_r
# exp_scores = np.exp(scores_c)
# probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
# corect_logprobs = -np.log(probs[range(n2),y_c])
# data_loss = np.sum(corect_logprobs) / n2
# # predictions
# pred_c = np.argmax(scores_c, axis=1)
# # accuracy
# acc = accuracy_score(y_c, pred_c)
# probs = 1.0/(1 + np.exp(-scores_c))
# preds_c=[ int(probs[i][ind]) for i, ind in enumerate(np.argmax(scores_c,axis=1)) ]
# acc = accuracy_score(y_c, np.array(preds_c))
# preds_c = np.array(preds_c)
# exp_scores = np.exp(-scores_c)
# probs1 = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
# all_logprobs = (probs[range(n2),y_c])
# data_loss = np.sum(all_logprobs) / n2
# loss=0
# for el in all_logprobs:
# if (el!=1 and el!=0):
# loss += y_c * (-np.log (el)) + (1-y_c) * (-np.log (1-el))
#predicted probability for classification to calculate cross entropy
probs = 1.0/(1 + np.exp(-scores_c))
preds_c=[ int(np.nan_to_num(probs[i][ind])) for i, ind in enumerate(np.argmax(scores_c,axis=1)) ]
acc = accuracy_score(y_c, np.array(preds_c))
preds_c = np.array(preds_c)
exp_scores = np.exp(-scores_c)
probs1 = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
all_logprobs = (probs[range(n2),y_c])
data_loss = np.sum(all_logprobs) / n2
loss=0
all_logprobs = np.clip(all_logprobs , 1e-15, 1 - 1e-15)
loss = (-1/n2) * np.sum(y_c * np.log(all_logprobs) + ((1. - y_c)* np.log(1. - all_logprobs)) )
# all metrics
metrics = {"Acc": acc,
"CXE + MSE loss": np.mean(loss) + np.round(mean_squared_error(y_r, pred_r),3),
"MSE": np.round(mean_squared_error(y_r, pred_r),3),
"CXE": np.mean(loss)
}
return metrics
def _predict_raw(self, X, val):
"""
Computes scores for each class and each object in X
Args:
X(ndarray): objects
Return:
scores(ndarray): scores for each class and object
"""
if val == 1:
# check if X has appended bias feature or not
if X.shape[1] == len(self._thetaReg):
scores = np.dot(X, self._thetaReg)
else:
scores = np.dot(X, self.coef_r) + self.intercept_r
else:
# check if X has appended bias feature or not
if X.shape[1] == len(self._thetaClass):
scores = np.dot(X, self._thetaClass)
else:
scores = np.dot(X, self.coef_c) + self.intercept_c
return scores
def predict(self, X):
"""
Predicts class for each object in X
Args:
X(ndarray): objects
Return:
pred(ndarray): class for each object
"""
scores_r = self._predict_raw(X, val=1)
scores_c = self._predict_raw(X, val=2)
pred_c = np.argmax(scores_c, axis=1)
return scores_r, pred_c
model_complex_homegrown = ComplexLogisticRegressionHomegrown(l1_reg = 0.0, l2_reg = 0.01)
model_complex_homegrown.fit(X_train_r, y_train_r,X_train_c, y_train_c, max_iter=500, lr1=0.004, lr2 = 0.0009, val_data_r=[X_valid_r,y_valid_r],val_data_c=[X_valid_c,y_valid_c])
plt.figure(figsize=(20, 8))
plt.suptitle("Homegrown Complex Model [CXE + MSE]")
#plt.subplot(121)
plt.plot(model_complex_homegrown.history["CXE + MSE train_loss"], label="Train")
plt.plot(model_complex_homegrown.history["CXE + MSE val_loss"], label="Test")
plt.legend(loc="upper left")
plt.xlabel("Iteration")
plt.ylabel("CXE + MSE");
plt.show()
plt.figure(figsize=(20, 8))
plt.suptitle("Homegrown Complex Logistic Regression")
#plt.subplot(121)
plt.plot(model_complex_homegrown.history["Train_acc_C"], label="Train")
plt.plot(model_complex_homegrown.history["Val_acc_C"], label="Test")
plt.legend(loc="upper left")
plt.xlabel("Iteration")
plt.ylabel("Accuracy");
plt.show()
model_complex_homegrown.history["Val_acc_C"][-1]
expLog = pd.DataFrame(columns=["exp_name",
'Training Accuracy',
"Validation Accuracy",
'Training CXE + MSE',
'Validation CSE + MSE'
])
model_complex_homegrown.history["CXE + MSE train_loss"][-1]
exp_name = f"Complex CXE + MSE Model"
expLog.loc[0,:5] = [f"{exp_name}"] + list(np.round(
[model_complex_homegrown.history["Train_acc_C"][-1],
model_complex_homegrown.history["Val_acc_C"][-1],
model_complex_homegrown.history["CXE + MSE train_loss"][-1],
model_complex_homegrown.history["CXE + MSE val_loss"][-1]],3))
expLog
X_test_r.shape[1]
X_test_r = scaler.transform(X_test_r)
pred_r, pred_c = model_complex_homegrown.predict(X_test_r)
pred_r
df = pd.read_csv('cadod.csv')
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
img = mpimg.imread('images/resized/'+df.ImageID.values[j]+'.jpg')
h, w = img.shape[:2]
coords = pred_r[i]
ax[i].imshow(img)
ax[i].set_title("Ground Truth: {0} \n Prediction: {1} ".format(idx_to_label[y_test_c[i]],
idx_to_label[np.round(pred_c[i])],),
color=("green" if np.round(pred_c[i]) == y_test_c[i] else "red"))
ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h),
coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h,
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
EfficientNet forms the backbone of the EfficientDet
Upon this bakbone, we add the BiFPN layer, which is the feature network layer
This layer combines representations of a given image at different resolutions.
We stack multiple BiFPN layer depending on the need
The final layer in this model is the Box/class Net layer
This is the output layer which predicts the class and the bounding labels in a single stage
We need to do the following before we feed the data into the efficient det model:
Create a .json file for train and validation data set which has data about image boundary box
Create a training and test folder which has images of cats and dogs
Create a .yml which has information about mean, std and anchor of the image
We have used the already existing transfer learning efficientdet for classification and regression tasks
We trained only the head layer; we have used the learning rate 0.002 and have trained the model for 30 epochs
Accuracy - the ratio of correction predictions to the total predictions
MAP (Mean Average Precision) -
Precision is the ratio of true positives to all predicted positives
Precision = True positives / (True positives + False positives)
IoU – Intersection over Union
Cross entropy loss function for classification and MSE for regression were used as loss function for the neural network
MSE – Evaluation parameter for regression
MSE – 1/n * sum (y – y_pred)^2
Y_pred – Predicted value of y
import numpy as np
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
!pip install pycocotools numpy==1.16.0 opencv-python tqdm tensorboard tensorboardX pyyaml webcolors matplotlib
!pip install torch==1.4.0
!pip install torchvision==0.5.0
import os
import sys
if "projects" not in os.getcwd():
!git clone --depth 1 https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch
os.chdir('Yet-Another-EfficientDet-Pytorch')
sys.path.append('.')
else:
!git pull
# download pretrained weights
! mkdir weights
! wget https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/releases/download/1.0/efficientdet-d0.pth -O weights/efficientdet-d0.pth
!pip install pycocotools numpy==1.16.0 opencv-python tqdm tensorboard tensorboardX pyyaml webcolors matplotlib
print (np.__version__)
#Preparing dataset for efficientdet
import pandas as pd
df = pd.read_csv('cadod.csv')
df.LabelName.replace({'/m/01yrx':'cat', '/m/0bt9lr':'dog'}, inplace=True)
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test_label = train_test_split(df, y_label, test_size=0.2, shuffle=True, stratify=y_label , random_state=27)
X_train['LabelName'].value_counts()
X_test['LabelName'].value_counts()
import os
if not os.path.exists('datasets'):
os.mkdir('datasets')
os.mkdir('datasets/cadod')
os.mkdir('datasets/cadod/train')
os.mkdir('datasets/cadod/val')
os.mkdir('datasets/cadod/annotations')
import json
train_json = val_json = {
"info": {
"description": "cadod dataset",
},
"licenses": [
{
"id": 1,
"name": 'null',
"url": 'null'
}
],
"categories": [
{
"id": 1,
"name": "cat",
"supercategory": "NA"
},
{
"id": 2,
"name": "dog",
"supercategory": "NA"
}
]
}
def category(j,img_name,class_label,bbox):
img_data = {"id": j,
"file_name": img_name,
"width": 128,
"height": 128,
"date_captured": "2021-12-14 01:45:18.567975",
"license": 1,
"cadod_url": "",
"flickr_url": ""}
annotations_data = {
"id": j,
"image_id": j,
"category_id": class_label,
"iscrowd": 0,
"area": math.floor(bbox[2]*bbox[3]*128*128),
"bbox": [
math.floor(bbox[0]*128),
math.floor(bbox[1]*128),
math.floor(bbox[2]*128),
math.floor(bbox[3]*128)
]
}
return img_data, annotations_data
from shutil import copyfile, copy, copy2
import math
images = []
annotation = []
j = 0
count_c = 0
count_d = 0
for i in tqdm(X_train.iterrows()):
bbox = []
img_name = i[1]['ImageID'] + '.jpg'
class_name = i[1]['LabelName']
bbox.append(i[1]['XMin'])
bbox.append(i[1]['YMin'])
bbox.append(abs(i[1]['XMin']-i[1]['XMax']))
bbox.append(abs(i[1]['YMin']-i[1]['YMax']))
if class_name == 'dog'and count_d < 750:
image_data,annotations_data = category(j,img_name,2,bbox)
images.append(image_data)
annotation.append(annotations_data)
copyfile(src='images/resized/'+img_name, dst='datasets/cadod/train/'+img_name)
count_d +=1
elif class_name == 'cat'and count_c < 750:
image_data,annotations_data = category(j,img_name,1,bbox)
images.append(image_data)
annotation.append(annotations_data)
copyfile(src='images/resized/'+img_name, dst='datasets/cadod/train/'+img_name)
count_c +=1
j+=1
train_json['images'] = images
train_json['annotations'] = annotation
with open('datasets/cadod/annotations/instances_train.json', 'w') as json_file:
json.dump(train_json, json_file,indent = 4)
images = []
annotation = []
j = 0
count_c = 0
count_d = 0
for i in tqdm(X_test.iterrows()):
bbox = []
img_name = i[1]['ImageID'] + '.jpg'
class_name = i[1]['LabelName']
bbox.append(i[1]['XMin'])
bbox.append(i[1]['YMin'])
bbox.append(abs(i[1]['XMin']-i[1]['XMax']))
bbox.append(abs(i[1]['YMin']-i[1]['YMax']))
if class_name == 'dog' and count_d<250:
image_data,annotations_data = category(j,img_name,2,bbox)
images.append(image_data)
annotation.append(annotations_data)
copyfile(src='images/resized/'+img_name, dst='datasets/cadod/val/'+img_name)
count_d +=1
elif class_name == 'cat' and count_c<250:
image_data,annotations_data = category(j,img_name,1,bbox)
images.append(image_data)
annotation.append(annotations_data)
copyfile(src='images/resized/'+img_name, dst='datasets/cadod/val/'+img_name)
count_c +=1
j+=1
val_json['images'] = images
val_json['annotations'] = annotation
with open('datasets/cadod/annotations/instances_val.json', 'w') as json_file:
json.dump(val_json, json_file,indent = 4)
import os, os.path
root = 'datasets/cadod/train/'
print("# of images in train folder:",len([x for x in os.listdir(root) if os.path.isfile(os.path.join(root, x))]))
root = 'datasets/cadod/val/'
print("# of images in val folder:",len([x for x in os.listdir(root) if os.path.isfile(os.path.join(root, x))]))
!cat cadod.yml
#pip install opencv-python-headless
! python Yet-Another-EfficientDet-Pytorch/train.py -c 0 -p cadod --head_only True --lr 0.0002 --batch_size 32 --load_weights weights/efficientdet-d0.pth --num_epochs 30 --save_interval 150
%cd logs/cadod
weight_file = !ls -Art | grep efficientdet
%cd ../..
#uncomment the next line to specify a weight file
weight_file[-1] = 'efficientdet-d0_29_final.pth'
! python Yet-Another-EfficientDet-Pytorch/coco_eval.py -c 0 -p cadod -w "logs/cadod/{weight_file[-1]}"
# Author: Zylo117
import torch
from torch import nn
from efficientdet.model import BiFPN, Regressor, Classifier, EfficientNet
from efficientdet.utils import Anchors
class EfficientDetBackbone(nn.Module):
def __init__(self, num_classes=80, compound_coef=0, load_weights=False, **kwargs):
super(EfficientDetBackbone, self).__init__()
self.compound_coef = compound_coef
self.backbone_compound_coef = [0, 1, 2, 3, 4, 5, 6, 6, 7]
self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384, 384]
self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8, 8]
self.input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536, 1536]
self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5, 5]
self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5., 4.]
self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
self.num_scales = len(kwargs.get('scales', [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]))
conv_channel_coef = {
# the channels of P3/P4/P5.
0: [40, 112, 320],
1: [40, 112, 320],
2: [48, 120, 352],
3: [48, 136, 384],
4: [56, 160, 448],
5: [64, 176, 512],
6: [72, 200, 576],
7: [72, 200, 576],
8: [80, 224, 640],
}
num_anchors = len(self.aspect_ratios) * self.num_scales
self.bifpn = nn.Sequential(
*[BiFPN(self.fpn_num_filters[self.compound_coef],
conv_channel_coef[compound_coef],
True if _ == 0 else False,
attention=True if compound_coef < 6 else False,
use_p8=compound_coef > 7)
for _ in range(self.fpn_cell_repeats[compound_coef])])
self.num_classes = num_classes
self.regressor = Regressor(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.classifier = Classifier(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_classes=num_classes,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.anchors = Anchors(anchor_scale=self.anchor_scale[compound_coef],
pyramid_levels=(torch.arange(self.pyramid_levels[self.compound_coef]) + 3).tolist(),
**kwargs)
self.backbone_net = EfficientNet(self.backbone_compound_coef[compound_coef], load_weights)
def freeze_bn(self):
for m in self.modules():
if isinstance(m, nn.BatchNorm2d):
m.eval()
def forward(self, inputs):
max_size = inputs.shape[-1]
_, p3, p4, p5 = self.backbone_net(inputs)
features = (p3, p4, p5)
features = self.bifpn(features)
regression = self.regressor(features)
classification = self.classifier(features)
anchors = self.anchors(inputs, inputs.dtype)
return features, regression, classification, anchors
def init_backbone(self, path):
state_dict = torch.load(path)
try:
ret = self.load_state_dict(state_dict, strict=False)
print(ret)
except RuntimeError as e:
print('Ignoring ' + str(e) + '"')
import torch
from torch.backends import cudnn
#from backbone import EfficientDetBackbone
import cv2
import matplotlib.pyplot as plt
import numpy as np
from efficientdet.utils import BBoxTransform, ClipBoxes
from utils.utils import preprocess, invert_affine, postprocess
compound_coef = 0
force_input_size = None # set None to use default size
img_path = 'datasets/cadod/val/00e4d41ec48a1d17.jpg'
threshold = 0.2
iou_threshold = 0.2
use_cuda = True
use_float16 = False
cudnn.fastest = True
cudnn.benchmark = True
obj_list = [ 'cat', 'dog' ]
# tf bilinear interpolation is different from any other's, just make do
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
ori_imgs, framed_imgs, framed_metas = preprocess(img_path, max_size=input_size)
# if use_cuda:
# x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
# else:
x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)
model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
# replace this part with your project's anchor config
ratios=[(0.7, 1.4), (1.0, 1.0), (1.5, 0.7)],
scales=[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
model.load_state_dict(torch.load('logs/cadod/'+weight_file[-1]))
model.requires_grad_(False)
model.eval()
# if use_cuda:
# model = model.cuda()
if use_float16:
model = model.half()
with torch.no_grad():
features, regression, classification, anchors = model(x)
regressBoxes = BBoxTransform()
clipBoxes = ClipBoxes()
out = postprocess(x,
anchors, regression, classification,
regressBoxes, clipBoxes,
threshold, iou_threshold)
out = invert_affine(framed_metas, out)
for i in range(len(ori_imgs)):
if len(out[i]['rois']) == 0:
continue
ori_imgs[i] = ori_imgs[i].copy()
for j in range(len(out[i]['rois'])):
(x1, y1, x2, y2) = out[i]['rois'][j].astype(np.int)
cv2.rectangle(ori_imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
obj = obj_list[out[i]['class_ids'][j]]
score = float(out[i]['scores'][j])
cv2.putText(ori_imgs[i], '{}, {:.3f}'.format(obj, score),
(x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(255, 255, 0), 1)
plt.imshow(ori_imgs[i]);
#plt.show()
import numpy as np
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
# download pretrained weights
#! mkdir weights
! wget https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/releases/download/1.0/efficientdet-d1.pth -O weights/efficientdet-d1.pth
!cat cadod.yml
#pip install opencv-python-headless
! python Yet-Another-EfficientDet-Pytorch/train.py -c 0 -p cadod --head_only True --lr 0.0002 --batch_size 32 --load_weights weights/efficientdet-d1.pth --num_epochs 30 --save_interval 150
#uncomment the next line to specify a weight file
weight_file = 'efficientdet-d1_29_final.pth'
! python Yet-Another-EfficientDet-Pytorch/coco_eval.py -c 0 -p cadod -w "logs/cadod/{weight_file}"
# Author: Zylo117
import torch
from torch import nn
from efficientdet.model import BiFPN, Regressor, Classifier, EfficientNet
from efficientdet.utils import Anchors
class EfficientDetBackbone(nn.Module):
def __init__(self, num_classes=80, compound_coef=0, load_weights=False, **kwargs):
super(EfficientDetBackbone, self).__init__()
self.compound_coef = compound_coef
self.backbone_compound_coef = [0, 1, 2, 3, 4, 5, 6, 6, 7]
self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384, 384]
self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8, 8]
self.input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536, 1536]
self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5, 5]
self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5., 4.]
self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
self.num_scales = len(kwargs.get('scales', [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]))
conv_channel_coef = {
# the channels of P3/P4/P5.
0: [40, 112, 320],
1: [40, 112, 320],
2: [48, 120, 352],
3: [48, 136, 384],
4: [56, 160, 448],
5: [64, 176, 512],
6: [72, 200, 576],
7: [72, 200, 576],
8: [80, 224, 640],
}
num_anchors = len(self.aspect_ratios) * self.num_scales
self.bifpn = nn.Sequential(
*[BiFPN(self.fpn_num_filters[self.compound_coef],
conv_channel_coef[compound_coef],
True if _ == 0 else False,
attention=True if compound_coef < 6 else False,
use_p8=compound_coef > 7)
for _ in range(self.fpn_cell_repeats[compound_coef])])
self.num_classes = num_classes
self.regressor = Regressor(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.classifier = Classifier(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_classes=num_classes,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.anchors = Anchors(anchor_scale=self.anchor_scale[compound_coef],
pyramid_levels=(torch.arange(self.pyramid_levels[self.compound_coef]) + 3).tolist(),
**kwargs)
self.backbone_net = EfficientNet(self.backbone_compound_coef[compound_coef], load_weights)
def freeze_bn(self):
for m in self.modules():
if isinstance(m, nn.BatchNorm2d):
m.eval()
def forward(self, inputs):
max_size = inputs.shape[-1]
_, p3, p4, p5 = self.backbone_net(inputs)
features = (p3, p4, p5)
features = self.bifpn(features)
regression = self.regressor(features)
classification = self.classifier(features)
anchors = self.anchors(inputs, inputs.dtype)
return features, regression, classification, anchors
def init_backbone(self, path):
state_dict = torch.load(path)
try:
ret = self.load_state_dict(state_dict, strict=False)
print(ret)
except RuntimeError as e:
print('Ignoring ' + str(e) + '"')
import torch
from torch.backends import cudnn
from backbone import EfficientDetBackbone
import cv2
import matplotlib.pyplot as plt
import numpy as np
from efficientdet.utils import BBoxTransform, ClipBoxes
from utils.utils import preprocess, invert_affine, postprocess
compound_coef = 0
force_input_size = None # set None to use default size
img_path = 'datasets/cadod/val/00e4d41ec48a1d17.jpg'
threshold = 0.2
iou_threshold = 0.2
use_cuda = True
use_float16 = False
cudnn.fastest = True
cudnn.benchmark = True
obj_list = [ 'cat', 'dog' ]
# tf bilinear interpolation is different from any other's, just make do
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
ori_imgs, framed_imgs, framed_metas = preprocess(img_path, max_size=input_size)
x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)
model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
# replace this part with your project's anchor config
ratios=[(0.7, 1.4), (1.0, 1.0), (1.5, 0.7)],
scales=[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
model.load_state_dict(torch.load('logs/cadod/'+weight_file))
model.requires_grad_(False)
model.eval()
with torch.no_grad():
features, regression, classification, anchors = model(x)
regressBoxes = BBoxTransform()
clipBoxes = ClipBoxes()
out = postprocess(x,
anchors, regression, classification,
regressBoxes, clipBoxes,
threshold, iou_threshold)
out = invert_affine(framed_metas, out)
for i in range(len(ori_imgs)):
if len(out[i]['rois']) == 0:
continue
ori_imgs[i] = ori_imgs[i].copy()
for j in range(len(out[i]['rois'])):
(x1, y1, x2, y2) = out[i]['rois'][j].astype(np.int)
cv2.rectangle(ori_imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
obj = obj_list[out[i]['class_ids'][j]]
score = float(out[i]['scores'][j])
cv2.putText(ori_imgs[i], '{}, {:.3f}'.format(obj, score),
(x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(255, 255, 0), 1)
plt.imshow(ori_imgs[i])
import numpy as np
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
# download pretrained weights
#! mkdir weights
! wget https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/releases/download/1.0/efficientdet-d2.pth -O weights/efficientdet-d2.pth
!cat cadod.yml
#pip install opencv-python-headless
! python Yet-Another-EfficientDet-Pytorch/train.py -c 0 -p cadod --head_only True --lr 0.0002 --batch_size 32 --load_weights weights/efficientdet-d2.pth --num_epochs 30 --save_interval 150
%cd logs/cadod
weight_file = !ls -Art | grep efficientdet
%cd ../..
#uncomment the next line to specify a weight file
weight_file[-1] = 'efficientdet-d2_25_final.pth'
! python Yet-Another-EfficientDet-Pytorch/coco_eval.py -c 0 -p cadod -w "logs/cadod/{weight_file[-1]}"
# Author: Zylo117
import torch
from torch import nn
from efficientdet.model import BiFPN, Regressor, Classifier, EfficientNet
from efficientdet.utils import Anchors
class EfficientDetBackbone(nn.Module):
def __init__(self, num_classes=80, compound_coef=0, load_weights=False, **kwargs):
super(EfficientDetBackbone, self).__init__()
self.compound_coef = compound_coef
self.backbone_compound_coef = [0, 1, 2, 3, 4, 5, 6, 6, 7]
self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384, 384]
self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8, 8]
self.input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536, 1536]
self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5, 5]
self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5., 4.]
self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
self.num_scales = len(kwargs.get('scales', [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]))
conv_channel_coef = {
# the channels of P3/P4/P5.
0: [40, 112, 320],
1: [40, 112, 320],
2: [48, 120, 352],
3: [48, 136, 384],
4: [56, 160, 448],
5: [64, 176, 512],
6: [72, 200, 576],
7: [72, 200, 576],
8: [80, 224, 640],
}
num_anchors = len(self.aspect_ratios) * self.num_scales
self.bifpn = nn.Sequential(
*[BiFPN(self.fpn_num_filters[self.compound_coef],
conv_channel_coef[compound_coef],
True if _ == 0 else False,
attention=True if compound_coef < 6 else False,
use_p8=compound_coef > 7)
for _ in range(self.fpn_cell_repeats[compound_coef])])
self.num_classes = num_classes
self.regressor = Regressor(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.classifier = Classifier(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_classes=num_classes,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.anchors = Anchors(anchor_scale=self.anchor_scale[compound_coef],
pyramid_levels=(torch.arange(self.pyramid_levels[self.compound_coef]) + 3).tolist(),
**kwargs)
self.backbone_net = EfficientNet(self.backbone_compound_coef[compound_coef], load_weights)
def freeze_bn(self):
for m in self.modules():
if isinstance(m, nn.BatchNorm2d):
m.eval()
def forward(self, inputs):
max_size = inputs.shape[-1]
_, p3, p4, p5 = self.backbone_net(inputs)
features = (p3, p4, p5)
features = self.bifpn(features)
regression = self.regressor(features)
classification = self.classifier(features)
anchors = self.anchors(inputs, inputs.dtype)
return features, regression, classification, anchors
def init_backbone(self, path):
state_dict = torch.load(path)
try:
ret = self.load_state_dict(state_dict, strict=False)
print(ret)
except RuntimeError as e:
print('Ignoring ' + str(e) + '"')
import matplotlib
matplotlib.pyplot.ion()
import numpy as np
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
# download pretrained weights
#! mkdir weights
! wget https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/releases/download/1.0/efficientdet-d7.pth -O weights/efficientdet-d7.pth
!cat cadod.yml
#pip install opencv-python-headless
! python Yet-Another-EfficientDet-Pytorch/train.py -c 0 -p cadod --head_only True --lr 0.0002 --batch_size 32 --load_weights weights/efficientdet-d7.pth --num_epochs 30 --save_interval 150
%cd logs/cadod
weight_file = !ls -Art | grep efficientdet
%cd ../..
#uncomment the next line to specify a weight file
weight_file = 'efficientdet-d0_29_1350.pth'
! python coco_eval.py -c 0 -p cadod -w "{weight_file}"
import os
os.getcwd()
#os.chdir('/N/u/svtranga/Carbonate/Downloads/')
import torch
from torch import nn
from efficientdet.model import BiFPN, Regressor, Classifier, EfficientNet
from efficientdet.utils import Anchors
class EfficientDetBackbone(nn.Module):
def __init__(self, num_classes=80, compound_coef=0, load_weights=False, **kwargs):
super(EfficientDetBackbone, self).__init__()
self.compound_coef = compound_coef
self.backbone_compound_coef = [0, 1, 2, 3, 4, 5, 6, 6, 7]
self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384, 384]
self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8, 8]
self.input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536, 1536]
self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5, 5]
self.pyramid_levels = [5, 5, 5, 5, 5, 5, 5, 5, 6]
self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5., 4.]
self.aspect_ratios = kwargs.get('ratios', [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)])
self.num_scales = len(kwargs.get('scales', [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]))
conv_channel_coef = {
# the channels of P3/P4/P5.
0: [40, 112, 320],
1: [40, 112, 320],
2: [48, 120, 352],
3: [48, 136, 384],
4: [56, 160, 448],
5: [64, 176, 512],
6: [72, 200, 576],
7: [72, 200, 576],
8: [80, 224, 640],
}
num_anchors = len(self.aspect_ratios) * self.num_scales
self.bifpn = nn.Sequential(
*[BiFPN(self.fpn_num_filters[self.compound_coef],
conv_channel_coef[compound_coef],
True if _ == 0 else False,
attention=True if compound_coef < 6 else False,
use_p8=compound_coef > 7)
for _ in range(self.fpn_cell_repeats[compound_coef])])
self.num_classes = num_classes
self.regressor = Regressor(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.classifier = Classifier(in_channels=self.fpn_num_filters[self.compound_coef], num_anchors=num_anchors,
num_classes=num_classes,
num_layers=self.box_class_repeats[self.compound_coef],
pyramid_levels=self.pyramid_levels[self.compound_coef])
self.anchors = Anchors(anchor_scale=self.anchor_scale[compound_coef],
pyramid_levels=(torch.arange(self.pyramid_levels[self.compound_coef]) + 3).tolist(),
**kwargs)
self.backbone_net = EfficientNet(self.backbone_compound_coef[compound_coef], load_weights)
def freeze_bn(self):
for m in self.modules():
if isinstance(m, nn.BatchNorm2d):
m.eval()
def forward(self, inputs):
max_size = inputs.shape[-1]
_, p3, p4, p5 = self.backbone_net(inputs)
features = (p3, p4, p5)
features = self.bifpn(features)
regression = self.regressor(features)
classification = self.classifier(features)
anchors = self.anchors(inputs, inputs.dtype)
return features, regression, classification, anchors
def init_backbone(self, path):
state_dict = torch.load(path)
try:
ret = self.load_state_dict(state_dict, strict=False)
print(ret)
except RuntimeError as e:
print('Ignoring ' + str(e) + '"')
import torch
from torch.backends import cudnn
#from backbone import EfficientDetBackbone
import cv2
import matplotlib.pyplot as plt
import numpy as np
from efficientdet.utils import BBoxTransform, ClipBoxes
from utils.utils import preprocess, invert_affine, postprocess
compound_coef = 0
force_input_size = None # set None to use default size
img_path = 'datasets/cadod/val/1b798f26f1fa0229.jpg'
threshold = 0.2
iou_threshold = 0.2
use_cuda = True
use_float16 = False
cudnn.fastest = True
cudnn.benchmark = True
obj_list = [ 'dog', 'cat' ]
# tf bilinear interpolation is different from any other's, just make do
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size
ori_imgs, framed_imgs, framed_metas = preprocess(img_path, max_size=input_size)
# if use_cuda:
# x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
# else:
x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)
x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)
model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
# replace this part with your project's anchor config
ratios=[(0.7, 1.4), (1.0, 1.0), (1.5, 0.7)],
scales=[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
model.load_state_dict(torch.load('logs/cadod/'+weight_file))
model.requires_grad_(False)
model.eval()
# if use_cuda:
# model = model.cuda()
if use_float16:
model = model.half()
with torch.no_grad():
features, regression, classification, anchors = model(x)
regressBoxes = BBoxTransform()
clipBoxes = ClipBoxes()
out = postprocess(x,
anchors, regression, classification,
regressBoxes, clipBoxes,
threshold, iou_threshold)
out = invert_affine(framed_metas, out)
for i in range(len(ori_imgs)):
if len(out[i]['rois']) == 0:
continue
ori_imgs[i] = ori_imgs[i].copy()
for j in range(len(out[i]['rois'])):
(x1, y1, x2, y2) = out[i]['rois'][j].astype(np.int)
cv2.rectangle(ori_imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
obj = obj_list[out[i]['class_ids'][j]]
score = float(out[i]['scores'][j])
cv2.putText(ori_imgs[i], '{}, {:.3f}'.format(obj, score),
(x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(255, 255, 0), 1)
plt.show()
plt.imshow(ori_imgs[i]);
plt.show()
break;
MAP (Mean Average Precision) -
Precision is the ratio of true positives to all predicted positives
Precision = True positives / (True positives + False positives)
IoU – Intersection over Union
EfficientDet = BiFPN + Compound Scaling for better accuracy and efficiency across wide variety resource constraints
D7 requires a greater number of compound scaling and hence requires more epoch to converge then D0
We have split the data into train, validation, and test and used 30 epochs to help in the convergence process
For image detection, we have used 2 Conv2D layers, 2 maxpool layer and one output layer.
ReLU has been used as the activation function and cross-entropy is the loss function.
In addition to this, we have experimented with dropout layers.
Drop out layer of 0.1 yielded us a model with better accuracy
We are using Adam optimizer as an optimizer for convergence
We have split the data into train, validation, and tests and used 50 epochs to help in the convergence process
We have used 3 fully connected neural networks to train our model
We have used ReLU as our activation function
Mean square is our loss function and we have experimented using stochastic gradient descent and adam as an optimizer for convergence
Adam was used as it gave us a better model
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from torch.optim import Adam
import matplotlib.pyplot as plt
import seaborn as sns
# Setting seeds to try and ensure we have the same results - this is not guaranteed across PyTorch releases.
import torch
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, SubsetRandomSampler
import random
from os import listdir
from shutil import copyfile
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from os import makedirs
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
from google.colab import drive drive.mount('/content/drive')
!ls "/content/drive/My Drive/Colab Notebooks"
random.seed(42)
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
print(X.shape, y_label.shape)
#creating directories for test and train data set
dataset_home = 'cat_vs_dog/'
subdirs = ['train/', 'test/']
for subdir in subdirs:
# create label subdirectories
labeldirs = ['dogs/', 'cats/']
for labldir in labeldirs:
newdir = dataset_home + subdir + labldir
makedirs(newdir, exist_ok=True)
#load the csv file
df = pd.read_csv("cadod.csv")
df.head()
df.LabelName.replace({'/m/01yrx':'cat', '/m/0bt9lr':'dog'}, inplace=True)
df.head()
dog_list = df[df.LabelName == 'dog']['ImageID']
cat_list = df[df.LabelName == 'cat']['ImageID']
#list(dog_list)
# moving images to test and train folder
random.seed(10)
# define ratio of pictures to use for test
test_ratio = 0.20
count_c = 0
count_d = 0
# copy training dataset images into subdirectories
src_directory = 'cadod/'
for file in listdir(src_directory):
#print(file.replace('.jpg','').replace('._','') in list(cat_list))
#print(file.replace('.jpg','').replace('._','') in list(dog_list))
src = src_directory + '/' + file
dst_dir = 'train/'
if random.random() < test_ratio:
dst_dir = 'test/'
if file.replace('.jpg','').replace('._','') in list(cat_list) and count_c < 2000:
dst = dataset_home + dst_dir + 'cats/' + file
count_c +=1
copyfile(src, dst)
elif file.replace('.jpg','').replace('._','') in list(dog_list) and count_d < 2000:
dst = dataset_home + dst_dir + 'dogs/' + file
count_d +=1
copyfile(src, dst)
from tensorflow.keras import models
from tensorflow.keras.applications import *
from tensorflow.keras import layers
expLog = pd.DataFrame(columns=["exp_name",
"Train Acc",
"Valid Acc",
"Test Acc",
])
dataset_size
train_set, val_set = torch.utils.data.random_split(train_it, [600, 200])
#splitting data into train, validation and test
trainloader = DataLoader(train_set, batch_size=64, shuffle=True)
valloader = DataLoader(val_set, shuffle=True, batch_size=16)
testloader = DataLoader(test_it, batch_size=32, shuffle=True)
for images, labels in trainloader:
print(images.size(), labels.size())
print(labels)
break
from torch.utils.tensorboard import SummaryWriter
import numpy as np
writer = SummaryWriter()
#Data augmentation
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
transform_test = transforms.Compose([
#transforms.ToPILImage(),
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
])
transform_train = transforms.Compose([
#transforms.ToPILImage(),
transforms.Resize((128, 128)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(40),
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
#transforms.RandomAutocontrast()
])
train_it = datasets.ImageFolder('cat_vs_dog/train/', transform=transform_train)
test_it = datasets.ImageFolder('cat_vs_dog/test/', transform=transform_test)
dataset_size = len(train_it)
dataset_indices = list(range(dataset_size))
np.random.shuffle(dataset_indices)
dataset_size
train_set, val_set = torch.utils.data.random_split(train_it, [700, 100])
trainloader = DataLoader(train_set, batch_size=54, shuffle=True)
valloader = DataLoader(val_set, shuffle=True, batch_size=16)
testloader = DataLoader(test_it, batch_size=16, shuffle=True)
for images, labels in trainloader:
print(images.size(), labels.size())
print(labels)
break
for batch, (images, labels) in enumerate(trainloader,1):
print(sum(labels))
break
#defining neural network layers
class cadod(nn.Module):
def __init__(self):
super(cadod, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
self.conv2_drop = nn.Dropout2d(0.1)
self.fc1 = nn.Linear(18000, 400)
self.fc2 = nn.Linear(400, 2)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return x
model = cadod()
criterion = nn.CrossEntropyLoss() #Loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.0002, weight_decay = 0.0001) #optimizer with learning rate of 0.0002
#scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones = [500,1000,1500], gamma = 0.5)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones = [500,1000,1500], gamma = 0.5)
rm -rf logs
accuracy_stats = {
'train': [],
"val": []
}
loss_stats = {
'train': [],
"val": []
}
import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
model = model.to(device)
#optimizer = Adam(filter(lambda p: p.requires_grad, model.parameters()))
num_epochs = 75
for e in range(num_epochs):
cum_epoch_loss = 0
cum_acc = 0
batch_loss = 0
model.train()
for batch, (images, labels) in enumerate(trainloader,1):
#print(labels)
#print(images.shape)
images = images.to(device)
labels = labels.to(device)
optimizer.zero_grad()
label_pred = model(images).squeeze()
#print(logps)
#break;
#labels = labels.unsqueeze(1)
loss = criterion(label_pred, labels)
acc = binary_acc(label_pred, labels)
loss.backward()
optimizer.step()
batch_loss += loss.item()
cum_acc += acc.item()
scheduler.step()
#print(f'Epoch({e}/{num_epochs} : Batch number({batch}/{len(trainloader)})')
with torch.no_grad():
model.eval()
val_epoch_loss = 0
val_epoch_acc = 0
for batch, (X_val_batch, y_val_batch) in enumerate(valloader,1):
X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
y_val_pred = model(X_val_batch).squeeze()
#y_val_pred = torch.unsqueeze(y_val_pred, 0)
val_loss = criterion(y_val_pred, y_val_batch)
val_acc = binary_acc(y_val_pred, y_val_batch)
val_epoch_loss += val_loss.item()
val_epoch_acc += val_acc.item()
#print(f'Epoch({e}/{num_epochs} : Train loss : {loss.item()} : Train Accuracy : {acc.item()} : Valid loss : {val_loss.item()} : Valid Accuracy : {val_acc.item()}')
loss_stats['train'].append(batch_loss/len(trainloader))
loss_stats['val'].append(val_epoch_loss/len(valloader))
accuracy_stats['train'].append(cum_acc/len(trainloader))
accuracy_stats['val'].append(val_epoch_acc/len(valloader))
print(f'Epoch({e}/{num_epochs})')
print(f'Training loss : {batch_loss/len(trainloader)}')
print(f'Training accuracy : {cum_acc/len(trainloader)}')
print(f'Validation loss : {val_epoch_loss/len(valloader)}')
print(f'Validation accuracy : {val_epoch_acc/len(valloader)}')
#For tensorboard
writer.add_scalars('Loss', {'Training Loss': batch_loss/len(trainloader),
'Validation Loss': val_epoch_loss/len(valloader),}, e)
writer.add_scalars('Accuracy', {'Accuracy/train': cum_acc/len(trainloader),
'Accuracy/validation': val_epoch_acc/len(valloader),}, e)
writer.close()
%load_ext tensorboard
%tensorboard --logdir=runs
y_pred_list = []
y_true_list = []
model.eval()
with torch.no_grad():
num_correct = 0
total = 0
#set_trace()
for batch, (images, labels) in enumerate(testloader,1):
pred = model(images)
#output = torch.exp(logps)
pred = torch.argmax(pred, 1)
y_pred_list.append(pred.numpy())
y_true_list.append(labels.numpy())
total += labels.size(0)
num_correct += (pred == labels).sum().item()
#print('accuracy', binary_acc(pred, labels).item())
print(f'Batch ({batch}/{len(testloader)})')
#if batch == 7:
#break
print(f'Accuracy of the model on {total} test images: {num_correct * 100 / total}% ')
y_pred_list = []
y_true_list = []
with torch.no_grad():
for batch, (images, labels) in enumerate(testloader,1):
y_test_pred = model(images)
_, y_pred_tag = torch.max(y_test_pred, dim = 1)
y_pred_list = [*y_pred_list,*y_pred_tag.cpu().numpy()]
y_true_list = [*y_true_list,*labels.cpu().numpy()]
y_pred_list_1 = np.array(y_pred_list).T
y_true_list_1 = np.array(y_true_list).T
#y_pred_list_1
exp_name = f"FCN with Dropout (p = 0.1)"
expLog.loc[5,:4] = [f"{exp_name}"] + list(np.round(
[accuracy_stats['train'][-1],
accuracy_stats['val'][-1],
(num_correct * 100 / total)],3))
expLog
import torch
import torchvision
import torch.utils.data
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
X = np.load('data/img.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
X_train, X_test, y_train, y_test = train_test_split(X, y_bbox, test_size=0.15, random_state=42)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.15, random_state=42)
from torchsummary import summary #install it if necessary using !pip install torchsummary
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
## Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train).astype(float)
X_validation = scaler.transform(X_validation).astype(float) #Transform valid set with the same constants
X_test = scaler.transform(X_test).astype(float) #Transform test set with the same constants
# convert numpy arrays to tensors
X_train_tensor = torch.from_numpy(X_train).float()
X_validation_tensor = torch.from_numpy(X_validation).float()
X_test_tensor = torch.from_numpy(X_test).float()
y_train_tensor = torch.from_numpy(y_train).float()
y_test_tensor = torch.from_numpy(y_test).float()
y_validation_tensor = torch.from_numpy(y_validation).float()
# create TensorDataset in PyTorch
train_ds = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
validation_ds = torch.utils.data.TensorDataset(X_validation_tensor, y_validation_tensor)
test_ds = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)
# create dataloader
batch_size = 96
train_loader = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=0)
valid_loader = torch.utils.data.DataLoader(validation_ds, batch_size=X_test.shape[0], shuffle=False, num_workers=0)
test_loader = torch.utils.data.DataLoader(test_ds, batch_size=X_test.shape[0], shuffle=False, num_workers=0)
#regression neural network
class Regression(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3)
self.pool2 = nn.MaxPool2d(2, 2)
self.conv3 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3)
self.pool3 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(in_features=14*14*16, out_features=64)
self.fc3 = nn.Linear(in_features=64, out_features=32)
self.fc5 = nn.Linear(in_features=32, out_features=4)
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = self.pool3(F.relu(self.conv3(x)))
x = nn.Flatten()(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc3(x))
r = self.fc5(x)
return r
loss_stats = {
'train': [],
"val": []
}
%reload_ext tensorboard
loss_fn = torch.nn.MSELoss(size_average=True)
optimizer = optim.Adam(model.parameters(), lr=0.00001, weight_decay = 0.005)
epochs = range(25)
count_t = 0
running_loss = 0.0
for epoch in epochs:
running_loss_t = 0.0
for batch, data in enumerate(train_loader):
# inputs, target = data[0].to(device), data[1].to(device)
inputs, target = data[0], data[1]
# Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
optimizer.zero_grad()
# do forward pass
output = model(inputs.float())
#print(output.shape, target.shape)
# compute loss and gradients
loss = loss_fn(output, torch.unsqueeze(target.float(), dim=1))
# get gradients w.r.t to parameters
loss.backward()
# perform gradient update
optimizer.step()
# print statistics
running_loss_t += loss.item()*inputs.size(0)
count_t += inputs.size(0)
print(f"Epoch {epoch+1}, Train MSE loss: {np.round(running_loss_t/count_t, 3)}")
loss_stats['train'].append(running_loss_t/count_t)
#print('Finished Training')
#Model evaluation on validation set
count = 0
running_loss = 0.0
test_size = 0
model.eval()
for batch, data in enumerate(valid_loader):
inputs, target = data[0], data[1]
# do forward pass
output = model(inputs.float())
# compute loss and gradients
loss = loss_fn(output, torch.unsqueeze(target.float(), dim=1))
running_loss += loss.item()*inputs.size(0)
count += inputs.size(0)
test_size += batch_size
print(f"Validation MSE loss: {np.round(running_loss/count, 3)}")
loss_stats['val'].append(running_loss/count)
#for tensorboard
writer.add_scalars('MSE', {'Training MSE': np.round(running_loss/count, 3),
'Validation MSE': np.round(running_loss/count, 3),}, epoch)
writer.close()
print("Finished Training!")
count = 0
running_loss = 0.0
test_size = 0
model.eval()
for batch, data in enumerate(test_loader):
inputs, target = data[0], data[1]
# do forward pass
output = model(inputs.float())
# compute loss and gradients
loss = loss_fn(output, torch.unsqueeze(target.float(), dim=1))
# print statistics
running_loss += loss.item()*inputs.size(0)
count += inputs.size(0)
test_size += batch_size
print(f" Test MSE loss: {np.round(running_loss/count, 3)}")
# predict test
# model.to(device)
expLog_R = pd.DataFrame(columns=["exp_name",
"Train MSE",
"Valid MSE",
"Test MSE",
])
#Logging the experiment
exp_name = f"FCN Regressor"
expLog_R.loc[8,:4] = [f"{exp_name}"] + list(np.round(
[loss_stats['train'][-1],
loss_stats['val'][-1],
running_loss/count],3))
expLog_R
Defined a model for classification and regression separately
For each epoch, we calculated CXE and MSE loss from the above models and then used the combined loss for back propagation
Adam optimizer was used to take the next step and update the weights and bias
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from torch.optim import Adam
import matplotlib.pyplot as plt
import seaborn as sns
# Setting seeds to try and ensure we have the same results - this is not guaranteed across PyTorch releases.
import torch
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, SubsetRandomSampler
import random
from os import listdir
from shutil import copyfile
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from os import makedirs
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
from tensorflow.keras import models
from tensorflow.keras.applications import *
from tensorflow.keras import layers
#creating directories for test and train data set
dataset_home = 'cat_vs_dog/'
subdirs = ['train/', 'test/']
for subdir in subdirs:
# create label subdirectories
labeldirs = ['dogs/', 'cats/']
for labldir in labeldirs:
newdir = dataset_home + subdir + labldir
makedirs(newdir, exist_ok=True)
#load the csv file
df = pd.read_csv("cadod.csv")
df.head()
df.LabelName.replace({'/m/01yrx':'cat', '/m/0bt9lr':'dog'}, inplace=True)
dog_list = df[df.LabelName == 'dog']['ImageID']
cat_list = df[df.LabelName == 'cat']['ImageID']
#list(dog_list)
file_name_test = []
file_name_train = []
# moving images to test and train folder
random.seed(10)
# define ratio of pictures to use for test
test_ratio = 0.20
count_c = 0
count_d = 0
# copy training dataset images into subdirectories
src_directory = 'cadod/'
for file in listdir(src_directory):
#print(file.replace('.jpg','').replace('._','') in list(cat_list))
#print(file.replace('.jpg','').replace('._','') in list(dog_list))
src = src_directory + '/' + file
dst_dir = 'train/'
if random.random() < test_ratio:
dst_dir = 'test/'
file_name_test.append(file.replace('.jpg','').replace('._',''))
if file.replace('.jpg','').replace('._','') in list(cat_list) and count_c < 500:
dst = dataset_home + dst_dir + 'cats/' + file
count_c +=1
copyfile(src, dst)
file_name_train.append(file.replace('.jpg','').replace('._',''))
elif file.replace('.jpg','').replace('._','') in list(dog_list) and count_d < 500:
dst = dataset_home + dst_dir + 'dogs/' + file
count_d +=1
copyfile(src, dst)
file_name_train.append(file.replace('.jpg','').replace('._',''))
train_id = pd.DataFrame (file_name_train, columns = ['ImageID'])
train_id.head()
test_id = pd.DataFrame (file_name_test, columns = ['ImageID'])
test_id.head()
df.ImageID.astype('O')
train_id.ImageID.astype('O')
df_n = df[['ImageID','XMin', 'YMin', 'XMax', 'YMax']]
df_n.set_index('ImageID')
train_id.set_index('ImageID')
train_id_n = df_n.join(train_id, how = 'left', lsuffix = '_left', rsuffix = '_right')
train_id_n.drop(columns = ['ImageID_right'], inplace = True)
train_id_n.head(5)
df.ImageID.astype('O')
test_id.ImageID.astype('O')
#df.set_index('ImageID')
test_id.set_index('ImageID')
test_id_n = df_n.join(test_id, how = 'left', lsuffix = '_left', rsuffix = '_right')
test_id_n.drop(columns = ['ImageID_right'], inplace = True)
test_id_n.head(5)
expLog = pd.DataFrame(columns=["exp_name",
"Train Loss",
"Valid Loss",
"Test Loss",
])
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
transform_test = transforms.Compose([
#transforms.ToPILImage(),
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
])
transform_train = transforms.Compose([
#transforms.ToPILImage(),
transforms.Resize((128, 128)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(40),
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
#transforms.RandomAutocontrast()
])
train_it = datasets.ImageFolder('cat_vs_dog/train/', transform=transform_train)
test_it = datasets.ImageFolder('cat_vs_dog/test/', transform=transform_test)
dataset_size = len(train_it)
dataset_indices = list(range(dataset_size))
np.random.shuffle(dataset_indices)
dataset_size
idx2class = {v: k for k, v in train_it.class_to_idx.items()}
idx2class
dataset_size = len(train_it)
dataset_indices = list(range(dataset_size))
#dataset_indices[val_split_index:]
np.random.shuffle(dataset_indices)
val_split_index = int(np.floor(0.2 * dataset_size))
train_idx, val_idx = dataset_indices[val_split_index:], dataset_indices[:val_split_index]
train_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(val_idx)
bs_train = 16
bs_test = 4
bs_valid = 8
trainloader = DataLoader(dataset=train_it, shuffle=False, batch_size=bs_train, sampler=train_sampler)
valloader = DataLoader(dataset=train_it, shuffle=False, batch_size=bs_valid, sampler=val_sampler)
testloader = DataLoader(test_it, batch_size=bs_test, shuffle=False)
y_box_train = train_id_n[val_split_index:]
y_box_val = train_id_n[:val_split_index]
y_box_val.shape
for images, labels in trainloader:
print(images.size(), labels.size())
print(labels)
break
for images, labels in valloader:
print(images.size(), labels.size())
print(labels)
break
import numpy as np
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
y_train_tensor = torch.from_numpy(y_box_train[['XMin', 'YMin', 'XMax', 'YMax']].to_numpy())
y_val_tensor = torch.from_numpy(y_box_val[['XMin', 'YMin', 'XMax', 'YMax']].to_numpy())
y_test_tensor = torch.from_numpy(test_id_n[['XMin', 'YMin', 'XMax', 'YMax']].to_numpy())
#y_val_tensor
from torch.optim import Adam
#defining neural network layers
class cadod_c(nn.Module):
def __init__(self):
super(cadod_c, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
self.conv2_drop = nn.Dropout2d(0.1)
self.fc1 = nn.Linear(18000, 400)
self.fc2 = nn.Linear(400, 2)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.shape[0],-1)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return x
model_c = cadod_c()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_c.parameters(), lr=0.0002, weight_decay = 3e-3)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones = [500,1000,1500], gamma = 0.5)
#regression neural network
class cadod_r(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3)
self.pool2 = nn.MaxPool2d(2, 2)
self.conv3 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3)
self.pool3 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(in_features=14*14*16, out_features=64)
self.fc3 = nn.Linear(in_features=64, out_features=32)
self.fc5 = nn.Linear(in_features=32, out_features=4)
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = self.pool3(F.relu(self.conv3(x)))
x = nn.Flatten()(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc3(x))
r = self.fc5(x)
return r
model_r = cadod_r()
# MSE loss scaffolding layer
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model_r.parameters(), lr=0.0005, weight_decay = 3e-4)
accuracy_stats = {
'train': [],
"val": []
}
loss_stats = {
'train': [],
"val": []
}
def binary_acc(y_pred, y_test):
y_pred_tag = torch.log_softmax(y_pred, dim = 1)
_, y_pred_tags = torch.max(y_pred_tag, dim = 1)
correct_results_sum = (y_pred_tags == y_test).sum().float()
acc = correct_results_sum/y_test.shape[0]
acc = torch.round(acc * 100)
return acc
from torch.utils.tensorboard import SummaryWriter
import numpy as np
writer = SummaryWriter()
model_c = model_c.to(device)
model_r = model_r.to(device)
num_epochs = 50
for e in range(num_epochs):
cum_epoch_loss = 0
cum_acc = 0
batch_loss = 0
mse_loss = 0
train = 0
test = 0
val = 0
model_c.train()
model_r.train()
#Training the model
for batch, (images, labels) in enumerate(trainloader,1):
images = images.to(device)
labels = labels.to(device)
#images_2 = images_2.to(device)
bbox = y_train_tensor[train:train+bs_train].to(device)
train +=bs_train
# Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
optimizer.zero_grad()
label_pred = model_c(images).squeeze() #training the classifier model
box_pred = model_r(images) #training the regressor model
loss_1 = criterion(label_pred, labels) #CXE loss
acc = binary_acc(label_pred, labels)
loss_2 = loss_fn(box_pred, torch.unsqueeze(bbox.float(), dim=1)) #MSE
loss = loss_1 + loss_2 #combined loss
loss.backward() #backpropagating loss
optimizer.step() #gradient update
batch_loss += loss.item()
cum_acc += acc.item()
scheduler.step()
#print(f'Epoch({e}/{num_epochs} : Batch number({batch}/{len(trainloader)})')
#Evaluating the model on validation set
with torch.no_grad():
model_c.eval()
model_r.eval()
val_epoch_loss = 0
val_epoch_acc = 0
val = 0
for batch, (X_val_batch, y_val_batch) in enumerate(valloader,1):
X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
y_box_val = y_val_tensor[val:val+bs_valid].to(device)
y_val_pred = model_c(X_val_batch).squeeze()
y_box_pred = model_r(X_val_batch).squeeze()
val_loss = criterion(y_val_pred, y_val_batch)
val_acc = binary_acc(y_val_pred, y_val_batch)
mse_loss = loss_fn(y_box_pred, torch.unsqueeze(y_box_val.float(), dim=1))
val_epoch_loss += val_loss.item() + mse_loss.item()
val_epoch_acc += val_acc.item()
val += bs_valid
#saving the results for plotting
loss_stats['train'].append(batch_loss/len(trainloader))
loss_stats['val'].append(val_epoch_loss/len(valloader))
accuracy_stats['train'].append(cum_acc/len(trainloader))
accuracy_stats['val'].append(val_epoch_acc/len(valloader))
print(f'Epoch({e}/{num_epochs})')
print(f'Training loss : {batch_loss/len(trainloader)}')
print(f'Training accuracy : {cum_acc/len(trainloader)}')
print(f'Validation loss : {val_epoch_loss/len(valloader)}')
print(f'Validation accuracy : {val_epoch_acc/len(valloader)}')
writer.add_scalars('CXE + MSE Loss', {'Training': np.round(batch_loss/len(trainloader), 3),
'Validation': np.round(val_epoch_loss/len(valloader), 3),}, e)
writer.close()
%load_ext tensorboard
%tensorboard --logdir=runs
#Getting testing loss
y_pred_list = []
y_true_list = []
#model.eval()
#with torch.no_grad():
with torch.no_grad():
model_c.eval()
model_r.eval()
test_epoch_loss = 0
test_epoch_acc = 0
#val_mse_loss = 0
test = 0
for batch, (X_test_batch, y_test_batch) in enumerate(testloader,1):
X_test_batch, y_test_batch = X_test_batch.to(device), y_test_batch.to(device)
y_box_test = y_test_tensor[test:test+bs_valid].to(device)
#print(y_box_val)
y_test_pred = model_c(X_test_batch).squeeze()
y_box_pred = model_r(X_test_batch).squeeze()
#y_val_pred = torch.unsqueeze(y_val_pred, 0)
test_loss = criterion(y_test_pred, y_test_batch)
test_acc = binary_acc(y_test_pred, y_test_batch)
#print(y_box_val)
#print(y_box_pred)
mse_loss = loss_fn(y_box_pred, torch.unsqueeze(y_box_test.float(), dim=1))
#print(val_loss.item(),mse_loss.item())
test_epoch_loss += test_loss.item() + mse_loss.item()
test_epoch_acc += test_acc.item()
test += bs_valid
#print(f'Epoch({e}/{num_epochs})')
print(f'Test loss : {test_epoch_loss/len(testloader)}')
Complex homegrown model did not converge well and resulted in a flat accuracy curve
Limited computing power of IU Red and Google Colab, which made the model fail unexpectedly during run time
We were not able to run all the efficientdet models since the training time was very high. Even 30 epoch took more than 4 hours to train
Due to this, we were not able to fine tune the model and improve the mAP score
Plotting the image with true and predicted value was an issue since plt.imshow did not display the image
Using MLP and FCN did not improve the accuracy to a greater extent, the accuracy from these models is still comparable to our baseline model (ie slight improvement in validation accuracy)
Though data augmentation, adding regularization, changing the number of epochs, and modifying layers helped us overcome overfitting, the performance did not improve significantly, and the accuracy was comparable to our baseline
Our main objective was to classify images of cats and dogs and to identify the location. This fundamental problem in computer vision is the basis of many other computer vision tasks. However, this field has grown multifolds in the last year and currently, we are able to classify the image and detect the boundary boxes in a single go.
In the previous phase, we used neural networks with data augmentation, drop out layers and regularization to make the prediction, which reduced overfitting the model and yielded us with an accuracy of ~60%. In addition to this, we also combined the CXE and MSE loss and then sent it back for optimization of the neural network which helped us reduce the loss steadily after each epoch. The final phase was concentrated on the following:
Created a homegrown linear and logistic regression model and combine the MSE and CXE losses. Here we were able to attain a training CXE + MSE of 48.5 and a validation loss of 59
Built an EfficientDet model [D0 – D7] to train our classifier and regressor.
A multi-headed fully convolutional neural network (FCN) was also implemented which gave us a test accuracy of 61%.
From the above table, though there were slight improvements in the FCN model in terms of training accuracy, the validation accuracy remained consistent and is less than our baseline model